A Finite-State Parser with Dependency Structure Output

نویسنده

  • David Elworthy
چکیده

Dependency parsers and nite-state parsers are both capable of rapid and robust parsing of natural language. Dependency parsers produce richer output structures, while nite-state parsers can be more eecient. We show how a nite-state parser can be used to produce dependency structures for most phrase types, with an O(n 2) complexity in the number of words. The parser allows syntactically ambiguous structures to be packed into a single representation. The parser has been used as a component of a natural-language information retrieval system, operating in English and Japanese. 1 Dependency parsing and nite-state parsing Recent work in parsing natural language has shown a shift away from the heavyweight theories of syntax and semantics, towards grammar formalisms which can be parsed eeciently and which make it easy to write robust grammars. Many of the lightweight techniques use either nite-state parsing, or some variant of dependency grammars. Finite-state parsing ooers the advantages of computational eeciency, and the ability to easily integrate a number of levels of processing, from the phonological (Bird and Ellison, 1994) through to the syntactic (Roche, 1997; Roche and Schabes, 1997). Dependency grammars have a long history in linguistics and NLP, from the linguistically inclined work of Mel' cuk (1988), to computation implementations such as the dependency parser of JJ arvinen and Tapanainen (1997). A related kind of grammar is the Link grammar of Sleator and Temperley (1991). The processing complexity of dependency grammars can be as low as O(n 3) (Eisner, 1996), where n is the number of words in the input. However, as Neuhaus and Brr oker (1997) show, if certain sorts of linguistic construct are allowed, the processing complexity becomes exponential, up to being NP-complete. The speciic constructs that cause this are non-projective ones, in which links between words must cross over, i.e. if there is a sequence of words W i :::W j :::W k in which W i and W k are linked, then W j has a link to a word preceding W i or following W k. Neuhaus and Brr oker show that such constructs can be found in English, in the form of topicalisation, and in German. Similar problems arise when the parser is allowed to tolerate ungrammatical input. For example, if W i and W k were spurious words in the input (say the

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تولید درخت بانک سازه‌ای زبان فارسی به روش تبدیل خودکار

Treebanks is one of important and useful resource in Natural Language Processing tasks. Dependency and phrase structures are two famous kinds of treebanks. There have already made many efforts to convert dependency structure to phrase structure. In this paper we study an approach to convert dependency structure to phrase structure because of lack of a big phrase structure Treebank in Persian. A...

متن کامل

Feature Engineering in Persian Dependency Parser

Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...

متن کامل

ارائۀ راهکاری قاعده‌مند جهت تبدیل خودکار درخت تجزیۀ نحوی وابستگی به درخت تجزیۀ نحوی ساخت‌سازه‌ای برای زبان فارسی

In this paper, an automatic method in converting a dependency parse tree into an equivalent phrase structure one, is introduced for the Persian language. In first step, a rule-based algorithm was designed. Then, Persian specific dependency-to-phrase structure conversion rules merged to the algorithm. Subsequently, the Persian dependency treebank with about 30,000 sentences was used as an input ...

متن کامل

Enriching the Output of a Parser Using Memory-based Learning

We describe a method for enriching the output of a parser with information available in a corpus. The method is based on graph rewriting using memorybased learning, applied to dependency structures. This general framework allows us to accurately recover both grammatical and semantic information as well as non-local dependencies. It also facilitates dependency-based evaluation of phrase structur...

متن کامل

Stacking of Dependency and Phrase Structure Parsers

We investigate the stacking of dependency and phrase structure parsers, i.e. we define features from the output of a phrase structure parser for a dependency parser and vice versa. Our features are based on the original form of the external parses and we also compare this approach to converting phrase structures to dependencies then applying standard stacking on the converted output. The propos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999